Exploiting Word Transformation in Statistical Machine Translation from Spanish to English

نویسنده

  • Deepa Gupta
چکیده

This paper investigates the use of morphosyntactic information to reduce datasparseness in statistical machine translation from Spanish to English. In particular, word-alignment training is performed by applying different word transformations using lemmas and stems. It has been observed that stem-based training is better than lemma-based training when up to 1 million running words of data are used. In this paper a new word-alignment training technique is proposed by exploiting syntactically motivated constraints to the parallel data. Preliminary experimental results show that stem-based training with syntactically motivated constraints gives significant improvement in translation performance. Finally, a technique to reduce the impact of out-of-vocabulary words is discussed. The considered task is the translation of Plenary Sessions of the European Parliament.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Grouping Multi-Word Expressions According To Part-Of-Speech In Statistical Machine Translation

This paper studies a strategy for identifying and using multi-word expressions in Statistical Machine Translation. The performance of the proposed strategy for various types of multi-word expressions (like nouns or verbs) is evaluated in terms of alignment quality as well as translation accuracy. Evaluations are performed by using real-life data, namely the European Parliament corpus. Results f...

متن کامل

The TCH machine translation system for IWSLT 2008

This paper reports on the first participation of TCH (Toshiba (China) Research and Development Center) at the IWSLT evaluation campaign. We participated in all the 5 translation tasks with Chinese as source language or target language. For Chinese-English and English-Chinese translation, we used hybrid systems that combine rule-based machine translation (RBMT) method and statistical machine tra...

متن کامل

Data Inferred Multi-word Expressions for Statistical Machine Translation

This paper presents a strategy for detecting and using multi-word expressions in Statistical Machine Translation. Performance of the proposed strategy is evaluated in terms of alignment quality as well as translation accuracy. Evaluations are performed by using the Verbmobil corpus. Results from translation tasks from English-toSpanish and from Spanish-to-English are presented and discussed.

متن کامل

Improving Word Alignment with Language Model Based Confidence Scores

This paper describes the statistical machine translation systems submitted to the ACL-WMT 2008 shared translation task. Systems were submitted for two translation directions: English→Spanish and Spanish→English. Using sentence pair confidence scores estimated with source and target language models, improvements are observed on the NewsCommentary test sets. Genre-dependent sentence pair confiden...

متن کامل

Towards the Use of Word Stems and Suffixes for Statistical Machine Translation

In this paper we present methods for improving the quality of translation from an inflected language into English by making use of part-of-speech tags and word stems and suffixes in the source language. Results for translations from Spanish and Catalan into English are presented on the LC-STAR trilingual corpus which consists of spontaneously spoken dialogues in the domain of travelling and app...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006